Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers

Identifieur interne : 002369 ( Ncbi/Merge ); précédent : 002368; suivant : 002370

Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers

Auteurs : Kanak Mahadik [États-Unis] ; Christopher Wright [États-Unis] ; Milind Kulkarni [États-Unis] ; Saurabh Bagchi [États-Unis] ; Somali Chaterji [États-Unis]

Source :

RBID : PMC:6795807

Abstract

Remarkable advancements in high-throughput gene sequencing technologies have led to an exponential growth in the number of sequenced genomes. However, unavailability of highly parallel and scalable de novo assembly algorithms have hindered biologists attempting to swiftly assemble high-quality complex genomes. Popular de Bruijn graph assemblers, such as IDBA-UD, generate high-quality assemblies by iterating over a set of k-values used in the construction of de Bruijn graphs (DBG). However, this process of sequentially iterating from small to large k-values slows down the process of assembly. In this paper, we propose ScalaDBG, which metamorphoses this sequential process, building DBGs for each distinct k-value in parallel. We develop an innovative mechanism to “patch” a higher k-valued graph with contigs generated from a lower k-valued graph. Moreover, ScalaDBG leverages multi-level parallelism, by both scaling up on all cores of a node, and scaling out to multiple nodes simultaneously. We demonstrate that ScalaDBG completes assembling the genome faster than IDBA-UD, but with similar accuracy on a variety of datasets (6.8X faster for one of the most complex genome in our dataset).


Url:
DOI: 10.1038/s41598-019-51284-9
PubMed: 31619717
PubMed Central: 6795807

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:6795807

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Scalable Genome Assembly through Parallel
<italic>de Bruijn</italic>
Graph Construction for Multiple
<italic>k</italic>
-mers</title>
<author>
<name sortKey="Mahadik, Kanak" sort="Mahadik, Kanak" uniqKey="Mahadik K" first="Kanak" last="Mahadik">Kanak Mahadik</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">Adobe Research, San Jose, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Adobe Research, San Jose</wicri:regionArea>
<wicri:noRegion>San Jose</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Wright, Christopher" sort="Wright, Christopher" uniqKey="Wright C" first="Christopher" last="Wright">Christopher Wright</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Kulkarni, Milind" sort="Kulkarni, Milind" uniqKey="Kulkarni M" first="Milind" last="Kulkarni">Milind Kulkarni</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Bagchi, Saurabh" sort="Bagchi, Saurabh" uniqKey="Bagchi S" first="Saurabh" last="Bagchi">Saurabh Bagchi</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Chaterji, Somali" sort="Chaterji, Somali" uniqKey="Chaterji S" first="Somali" last="Chaterji">Somali Chaterji</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">31619717</idno>
<idno type="pmc">6795807</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6795807</idno>
<idno type="RBID">PMC:6795807</idno>
<idno type="doi">10.1038/s41598-019-51284-9</idno>
<date when="2019">2019</date>
<idno type="wicri:Area/Pmc/Corpus">000461</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000461</idno>
<idno type="wicri:Area/Pmc/Curation">000461</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000461</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000190</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000190</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:31619717</idno>
<idno type="wicri:Area/PubMed/Corpus">000393</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000393</idno>
<idno type="wicri:Area/PubMed/Curation">000393</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000393</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000392</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000392</idno>
<idno type="wicri:Area/Ncbi/Merge">002369</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Scalable Genome Assembly through Parallel
<italic>de Bruijn</italic>
Graph Construction for Multiple
<italic>k</italic>
-mers</title>
<author>
<name sortKey="Mahadik, Kanak" sort="Mahadik, Kanak" uniqKey="Mahadik K" first="Kanak" last="Mahadik">Kanak Mahadik</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">Adobe Research, San Jose, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Adobe Research, San Jose</wicri:regionArea>
<wicri:noRegion>San Jose</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Wright, Christopher" sort="Wright, Christopher" uniqKey="Wright C" first="Christopher" last="Wright">Christopher Wright</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Kulkarni, Milind" sort="Kulkarni, Milind" uniqKey="Kulkarni M" first="Milind" last="Kulkarni">Milind Kulkarni</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Bagchi, Saurabh" sort="Bagchi, Saurabh" uniqKey="Bagchi S" first="Saurabh" last="Bagchi">Saurabh Bagchi</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Chaterji, Somali" sort="Chaterji, Somali" uniqKey="Chaterji S" first="Somali" last="Chaterji">Somali Chaterji</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Scientific Reports</title>
<idno type="eISSN">2045-2322</idno>
<imprint>
<date when="2019">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p id="Par1">Remarkable advancements in high-throughput gene sequencing technologies have led to an exponential growth in the number of sequenced genomes. However, unavailability of highly parallel and scalable
<italic>de novo</italic>
assembly algorithms have hindered biologists attempting to swiftly assemble high-quality complex genomes. Popular
<italic>de Bruijn</italic>
graph assemblers, such as IDBA-UD, generate high-quality assemblies by iterating over a set of
<italic>k</italic>
-values used in the construction of de Bruijn graphs (DBG). However, this process of
<italic>sequentially</italic>
iterating from small to large
<italic>k</italic>
-values slows down the process of assembly. In this paper, we propose ScalaDBG, which metamorphoses this sequential process, building DBGs for each distinct
<italic>k</italic>
-value in parallel. We develop an innovative mechanism to “patch” a higher
<italic>k</italic>
-valued graph with contigs generated from a lower
<italic>k</italic>
-valued graph. Moreover, ScalaDBG leverages multi-level parallelism, by both scaling up on all cores of a node, and scaling out to multiple nodes
<italic>simultaneously</italic>
. We demonstrate that ScalaDBG completes assembling the genome faster than IDBA-UD, but with similar accuracy on a variety of datasets (6.8X faster for one of the most complex genome in our dataset).</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Stephens, Zd" uniqKey="Stephens Z">ZD Stephens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simpson, Jt" uniqKey="Simpson J">JT Simpson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gnerre, S" uniqKey="Gnerre S">S Gnerre</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Compeau, Pe" uniqKey="Compeau P">PE Compeau</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author>
<name sortKey="Tesler, G" uniqKey="Tesler G">G Tesler</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peng, Y" uniqKey="Peng Y">Y Peng</name>
</author>
<author>
<name sortKey="Leung, Hc" uniqKey="Leung H">HC Leung</name>
</author>
<author>
<name sortKey="Yiu, S M" uniqKey="Yiu S">S-M Yiu</name>
</author>
<author>
<name sortKey="Chin, Fy" uniqKey="Chin F">FY Chin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Luo, R" uniqKey="Luo R">R Luo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bankevich, A" uniqKey="Bankevich A">A Bankevich</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chaterji, S" uniqKey="Chaterji S">S Chaterji</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boisvert, S" uniqKey="Boisvert S">S Boisvert</name>
</author>
<author>
<name sortKey="Laviolette, F" uniqKey="Laviolette F">F Laviolette</name>
</author>
<author>
<name sortKey="Corbeil, J" uniqKey="Corbeil J">J Corbeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
<author>
<name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
<author>
<name sortKey="Maskell, Dl" uniqKey="Maskell D">DL Maskell</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peng, Y" uniqKey="Peng Y">Y Peng</name>
</author>
<author>
<name sortKey="Leung, Hc" uniqKey="Leung H">HC Leung</name>
</author>
<author>
<name sortKey="Yiu, S M" uniqKey="Yiu S">S-M Yiu</name>
</author>
<author>
<name sortKey="Chin, Fy" uniqKey="Chin F">FY Chin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simpson, Jt" uniqKey="Simpson J">JT Simpson</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gurevich, A" uniqKey="Gurevich A">A Gurevich</name>
</author>
<author>
<name sortKey="Saveliev, V" uniqKey="Saveliev V">V Saveliev</name>
</author>
<author>
<name sortKey="Vyahhi, N" uniqKey="Vyahhi N">N Vyahhi</name>
</author>
<author>
<name sortKey="Tesler, G" uniqKey="Tesler G">G Tesler</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<double pmid="31619717">
<pmc>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Scalable Genome Assembly through Parallel
<italic>de Bruijn</italic>
Graph Construction for Multiple
<italic>k</italic>
-mers</title>
<author>
<name sortKey="Mahadik, Kanak" sort="Mahadik, Kanak" uniqKey="Mahadik K" first="Kanak" last="Mahadik">Kanak Mahadik</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">Adobe Research, San Jose, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Adobe Research, San Jose</wicri:regionArea>
<wicri:noRegion>San Jose</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Wright, Christopher" sort="Wright, Christopher" uniqKey="Wright C" first="Christopher" last="Wright">Christopher Wright</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Kulkarni, Milind" sort="Kulkarni, Milind" uniqKey="Kulkarni M" first="Milind" last="Kulkarni">Milind Kulkarni</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Bagchi, Saurabh" sort="Bagchi, Saurabh" uniqKey="Bagchi S" first="Saurabh" last="Bagchi">Saurabh Bagchi</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Chaterji, Somali" sort="Chaterji, Somali" uniqKey="Chaterji S" first="Somali" last="Chaterji">Somali Chaterji</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">31619717</idno>
<idno type="pmc">6795807</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6795807</idno>
<idno type="RBID">PMC:6795807</idno>
<idno type="doi">10.1038/s41598-019-51284-9</idno>
<date when="2019">2019</date>
<idno type="wicri:Area/Pmc/Corpus">000461</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000461</idno>
<idno type="wicri:Area/Pmc/Curation">000461</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000461</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000190</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000190</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Scalable Genome Assembly through Parallel
<italic>de Bruijn</italic>
Graph Construction for Multiple
<italic>k</italic>
-mers</title>
<author>
<name sortKey="Mahadik, Kanak" sort="Mahadik, Kanak" uniqKey="Mahadik K" first="Kanak" last="Mahadik">Kanak Mahadik</name>
<affiliation wicri:level="1">
<nlm:aff id="Aff1">Adobe Research, San Jose, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Adobe Research, San Jose</wicri:regionArea>
<wicri:noRegion>San Jose</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Wright, Christopher" sort="Wright, Christopher" uniqKey="Wright C" first="Christopher" last="Wright">Christopher Wright</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Kulkarni, Milind" sort="Kulkarni, Milind" uniqKey="Kulkarni M" first="Milind" last="Kulkarni">Milind Kulkarni</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Bagchi, Saurabh" sort="Bagchi, Saurabh" uniqKey="Bagchi S" first="Saurabh" last="Bagchi">Saurabh Bagchi</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
<author>
<name sortKey="Chaterji, Somali" sort="Chaterji, Somali" uniqKey="Chaterji S" first="Somali" last="Chaterji">Somali Chaterji</name>
<affiliation wicri:level="2">
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0004 1937 2197</institution-id>
<institution-id institution-id-type="GRID">grid.169077.e</institution-id>
<institution>Purdue University,</institution>
</institution-wrap>
West Lafayette, IN USA</nlm:aff>
<country>États-Unis</country>
<placeName>
<region type="state">Indiana</region>
</placeName>
<wicri:cityArea>West Lafayette</wicri:cityArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Scientific Reports</title>
<idno type="eISSN">2045-2322</idno>
<imprint>
<date when="2019">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p id="Par1">Remarkable advancements in high-throughput gene sequencing technologies have led to an exponential growth in the number of sequenced genomes. However, unavailability of highly parallel and scalable
<italic>de novo</italic>
assembly algorithms have hindered biologists attempting to swiftly assemble high-quality complex genomes. Popular
<italic>de Bruijn</italic>
graph assemblers, such as IDBA-UD, generate high-quality assemblies by iterating over a set of
<italic>k</italic>
-values used in the construction of de Bruijn graphs (DBG). However, this process of
<italic>sequentially</italic>
iterating from small to large
<italic>k</italic>
-values slows down the process of assembly. In this paper, we propose ScalaDBG, which metamorphoses this sequential process, building DBGs for each distinct
<italic>k</italic>
-value in parallel. We develop an innovative mechanism to “patch” a higher
<italic>k</italic>
-valued graph with contigs generated from a lower
<italic>k</italic>
-valued graph. Moreover, ScalaDBG leverages multi-level parallelism, by both scaling up on all cores of a node, and scaling out to multiple nodes
<italic>simultaneously</italic>
. We demonstrate that ScalaDBG completes assembling the genome faster than IDBA-UD, but with similar accuracy on a variety of datasets (6.8X faster for one of the most complex genome in our dataset).</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Stephens, Zd" uniqKey="Stephens Z">ZD Stephens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zerbino, Dr" uniqKey="Zerbino D">DR Zerbino</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simpson, Jt" uniqKey="Simpson J">JT Simpson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gnerre, S" uniqKey="Gnerre S">S Gnerre</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Compeau, Pe" uniqKey="Compeau P">PE Compeau</name>
</author>
<author>
<name sortKey="Pevzner, Pa" uniqKey="Pevzner P">PA Pevzner</name>
</author>
<author>
<name sortKey="Tesler, G" uniqKey="Tesler G">G Tesler</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peng, Y" uniqKey="Peng Y">Y Peng</name>
</author>
<author>
<name sortKey="Leung, Hc" uniqKey="Leung H">HC Leung</name>
</author>
<author>
<name sortKey="Yiu, S M" uniqKey="Yiu S">S-M Yiu</name>
</author>
<author>
<name sortKey="Chin, Fy" uniqKey="Chin F">FY Chin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Luo, R" uniqKey="Luo R">R Luo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bankevich, A" uniqKey="Bankevich A">A Bankevich</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chaterji, S" uniqKey="Chaterji S">S Chaterji</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boisvert, S" uniqKey="Boisvert S">S Boisvert</name>
</author>
<author>
<name sortKey="Laviolette, F" uniqKey="Laviolette F">F Laviolette</name>
</author>
<author>
<name sortKey="Corbeil, J" uniqKey="Corbeil J">J Corbeil</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, Y" uniqKey="Liu Y">Y Liu</name>
</author>
<author>
<name sortKey="Schmidt, B" uniqKey="Schmidt B">B Schmidt</name>
</author>
<author>
<name sortKey="Maskell, Dl" uniqKey="Maskell D">DL Maskell</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peng, Y" uniqKey="Peng Y">Y Peng</name>
</author>
<author>
<name sortKey="Leung, Hc" uniqKey="Leung H">HC Leung</name>
</author>
<author>
<name sortKey="Yiu, S M" uniqKey="Yiu S">S-M Yiu</name>
</author>
<author>
<name sortKey="Chin, Fy" uniqKey="Chin F">FY Chin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simpson, Jt" uniqKey="Simpson J">JT Simpson</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gurevich, A" uniqKey="Gurevich A">A Gurevich</name>
</author>
<author>
<name sortKey="Saveliev, V" uniqKey="Saveliev V">V Saveliev</name>
</author>
<author>
<name sortKey="Vyahhi, N" uniqKey="Vyahhi N">N Vyahhi</name>
</author>
<author>
<name sortKey="Tesler, G" uniqKey="Tesler G">G Tesler</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
</pmc>
<pubmed>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers.</title>
<author>
<name sortKey="Mahadik, Kanak" sort="Mahadik, Kanak" uniqKey="Mahadik K" first="Kanak" last="Mahadik">Kanak Mahadik</name>
<affiliation wicri:level="1">
<nlm:affiliation>Adobe Research, San Jose, USA. mahadik@adobe.com.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Adobe Research, San Jose</wicri:regionArea>
<wicri:noRegion>San Jose</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Wright, Christopher" sort="Wright, Christopher" uniqKey="Wright C" first="Christopher" last="Wright">Christopher Wright</name>
<affiliation wicri:level="2">
<nlm:affiliation>Purdue University, West Lafayette, IN, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Purdue University, West Lafayette, IN</wicri:regionArea>
<placeName>
<region type="state">Indiana</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Kulkarni, Milind" sort="Kulkarni, Milind" uniqKey="Kulkarni M" first="Milind" last="Kulkarni">Milind Kulkarni</name>
<affiliation wicri:level="2">
<nlm:affiliation>Purdue University, West Lafayette, IN, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Purdue University, West Lafayette, IN</wicri:regionArea>
<placeName>
<region type="state">Indiana</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Bagchi, Saurabh" sort="Bagchi, Saurabh" uniqKey="Bagchi S" first="Saurabh" last="Bagchi">Saurabh Bagchi</name>
<affiliation wicri:level="2">
<nlm:affiliation>Purdue University, West Lafayette, IN, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Purdue University, West Lafayette, IN</wicri:regionArea>
<placeName>
<region type="state">Indiana</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Chaterji, Somali" sort="Chaterji, Somali" uniqKey="Chaterji S" first="Somali" last="Chaterji">Somali Chaterji</name>
<affiliation wicri:level="2">
<nlm:affiliation>Purdue University, West Lafayette, IN, USA. schaterji@schaterji.io.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Purdue University, West Lafayette, IN</wicri:regionArea>
<placeName>
<region type="state">Indiana</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2019">2019</date>
<idno type="RBID">pubmed:31619717</idno>
<idno type="pmid">31619717</idno>
<idno type="doi">10.1038/s41598-019-51284-9</idno>
<idno type="wicri:Area/PubMed/Corpus">000393</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000393</idno>
<idno type="wicri:Area/PubMed/Curation">000393</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000393</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000392</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000392</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers.</title>
<author>
<name sortKey="Mahadik, Kanak" sort="Mahadik, Kanak" uniqKey="Mahadik K" first="Kanak" last="Mahadik">Kanak Mahadik</name>
<affiliation wicri:level="1">
<nlm:affiliation>Adobe Research, San Jose, USA. mahadik@adobe.com.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Adobe Research, San Jose</wicri:regionArea>
<wicri:noRegion>San Jose</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Wright, Christopher" sort="Wright, Christopher" uniqKey="Wright C" first="Christopher" last="Wright">Christopher Wright</name>
<affiliation wicri:level="2">
<nlm:affiliation>Purdue University, West Lafayette, IN, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Purdue University, West Lafayette, IN</wicri:regionArea>
<placeName>
<region type="state">Indiana</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Kulkarni, Milind" sort="Kulkarni, Milind" uniqKey="Kulkarni M" first="Milind" last="Kulkarni">Milind Kulkarni</name>
<affiliation wicri:level="2">
<nlm:affiliation>Purdue University, West Lafayette, IN, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Purdue University, West Lafayette, IN</wicri:regionArea>
<placeName>
<region type="state">Indiana</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Bagchi, Saurabh" sort="Bagchi, Saurabh" uniqKey="Bagchi S" first="Saurabh" last="Bagchi">Saurabh Bagchi</name>
<affiliation wicri:level="2">
<nlm:affiliation>Purdue University, West Lafayette, IN, USA.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Purdue University, West Lafayette, IN</wicri:regionArea>
<placeName>
<region type="state">Indiana</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Chaterji, Somali" sort="Chaterji, Somali" uniqKey="Chaterji S" first="Somali" last="Chaterji">Somali Chaterji</name>
<affiliation wicri:level="2">
<nlm:affiliation>Purdue University, West Lafayette, IN, USA. schaterji@schaterji.io.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Purdue University, West Lafayette, IN</wicri:regionArea>
<placeName>
<region type="state">Indiana</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Scientific reports</title>
<idno type="eISSN">2045-2322</idno>
<imprint>
<date when="2019" type="published">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Remarkable advancements in high-throughput gene sequencing technologies have led to an exponential growth in the number of sequenced genomes. However, unavailability of highly parallel and scalable de novo assembly algorithms have hindered biologists attempting to swiftly assemble high-quality complex genomes. Popular de Bruijn graph assemblers, such as IDBA-UD, generate high-quality assemblies by iterating over a set of k-values used in the construction of de Bruijn graphs (DBG). However, this process of sequentially iterating from small to large k-values slows down the process of assembly. In this paper, we propose ScalaDBG, which metamorphoses this sequential process, building DBGs for each distinct k-value in parallel. We develop an innovative mechanism to "patch" a higher k-valued graph with contigs generated from a lower k-valued graph. Moreover, ScalaDBG leverages multi-level parallelism, by both scaling up on all cores of a node, and scaling out to multiple nodes simultaneously. We demonstrate that ScalaDBG completes assembling the genome faster than IDBA-UD, but with similar accuracy on a variety of datasets (6.8X faster for one of the most complex genome in our dataset).</div>
</front>
</TEI>
</pubmed>
</double>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Ncbi/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002369 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd -nk 002369 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Ncbi
   |étape=   Merge
   |type=    RBID
   |clé=     PMC:6795807
   |texte=   Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Ncbi/Merge/RBID.i   -Sk "pubmed:31619717" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Ncbi/Merge/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021